The Annals of Statistics 

2008, Vol. 36, No. 1, 310-336 

DOI: 10.1214/009053607000000721 

(c) Institute of Mathematical Statistics, 2008 

NONLINEAR ESTIMATION FOR LINEAR INVERSE PROBLEMS 
WITH ERROR IN THE OPERATOR^ 

By Marc Hoffmann and Markus Reiss 

University of Marne-la-Vallee and University of Heidelberg 

We study two nonlinear methods for statistical linear inverse 
problems when the operator is not known. The two constructions 
combine Galerkin regularization and wavelet thresholding. Their per- 
formances depend on the underlying structure of the operator, quan- 
tified by an index of sparsity. We prove their rate-optimality and 
adaptivity properties over Besov classes. 

1. Introduction. 

Linear inverse problems with error in the operator. We want to recover 
/ G L'^i'D), where V is a domain in W^, from data 

(1.1) ge = Kf + eW, 

where K is an unknown linear operator K:Lp'{T>) Lp'{Q), Q is a domain 
in M''', W is Gaussian white noise and e > is the noise level. We do not 
know K exactly, but we have access to 

(1.2) Ks = K + 6B. 

The process Kg is a blurred version of K, polluted by a Gaussian opera- 
tor white noise B with a noise level 5 > 0. The operator K acting on / is 
unknown and treated as a nuisance parameter. However, preliminary statis- 
tical inference about K is possible, with an accuracy governed by 6. Another 
equivalent approach is to consider that for experimental reasons we never 
have access to K in practice, but rather to Kg. The error level 6 can be 
linked to the accuracy of supplementary experiments; see Efromovich and 
Koltchinskii [11] and the examples below. In most interesting cases 
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is not continuous and the estimation problem (1.1) is ill-posed (e.g., see 
Nussbaum and Pereverzev [16] and Engl, Hanke and Neubauer [12]). 

The statistical model is thus given by the observation (g^jKs). Asymp- 
totics are taken as 5, e — > simultaneously. In probabilistic terms, observable 
quantities take the form 

{ge,k) := {Kf, k)L^Q) + e{W, k) VA: G L\Q) 

and 

{Ksh, k) := {Kh, A:)i2(Q) + 5{Bh, k) V(/i, k) G L^{V) x L^{Q). 

The mapping k G L'^{Q) ^ defines a centered Gaussian linear form, 

with covariance 

E[(T^, ki){W, k2)] = {kiM) LHQ), h,k2£ L\Q). 

Likewise, {h,k) G L'^i'D) x L?'{Q) ^ {Bh,k) defines a centered Gaussian bi- 
linear form with covariance 

K[{Bhi,ki){Bh2,k2)] = (/ii, /i2)l2CD)(^i, ^2)l2{S)- 

If (/ii)i>i and (/ci)i>i form orthonormal bases of L'^{p) and L^(Q), respectively — 
in particular, we will consider hereafter wavelet bases, the infinite vector 
{(yV ^kj))j>i and the infinite matrix {{Bhi,kj))ij>i have i.i.d. standard 
Gaussian entries. Another description of the operator white noise is given 
by stochastic integration using a Brownian sheet, which can be interpreted 
as a white noise model for kernel observations; see Section 2 below. 

Main results. The interplay between 6 and e is crucial: if 6 <^ e, one 
expects to recover model (1.1) with a known K. On the other hand, we will 
exhibit a different picture if e <C (5. Even when the error e in the signal gs 
dominates 6, the assumption 5^0 has to be handled carefully. We restrict 
our attention to the case Q = 'D and nonnegative operators K on Lp'iT)). 

We first consider a linear estimator based on the Galerkin projection 
method. For functions in the L^-Sobolev space and suitable approxima- 
tion spaces, the linear estimator converges with the minimax rate 
max{(^, e}^^/^^*"''^*"'''^^ where t > is the degree of ill-posedness of K. 

For spatially inhomogeneous functions, like smooth functions with jumps, 
linear estimators cannot attain optimal rates of convergence; see, for exam- 
ple, Donoho and Johnstone [10]. Therefore we propose two nonlinear meth- 
ods by separating the two steps of Galerkin inversion (/) and adaptive 
smoothing (S), which provides two strategies: 

Nonlinear Estimation I: {gs,Ks) /]™ -^—^ //j, 
Nonhnear Estimation II: {g^, Ks) {(Je, K5) fsje, 
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where /]™ is a preliminary and undersmoothed linear estimator. We use 
a Galerkin scheme on a high-dimensional space as inversion procedure (/) 
and wavelet thresholding as adaptive smoothing technique (S), with a level- 
dependent thresholding rule in Nonlinear Estimation I and a noise reduction 
in the operator by entrywise thresholding of the wavelet matrix represen- 
tation in Nonlinear Estimation II. To our knowledge, thresholding for the 
operator is new in a statistical framework. 

From both mathematical and numerical perspectives, the inversion step 
is critical: we cannot choose an arbitrarily large approximation space for 
the inversion, even in Nonlinear Estimation II. Nevertheless, both methods 
are provably rate-optimal (up to a log factor in some cases for the second 
method) over a wide range of (sparse) nonparametric classes, expressed in 
terms of Besov spaces with p<2. 

Organization of the paper. Section 2 discusses related approaches. The 
theory of linear and nonlinear estimation is presented in Sections 3 to 5. 
Section 6 discusses the numerical implementation. The proofs of the main 
theorems are deferred to Section 7 and the Appendix provides technical 
results and some tools from approximation theory. 

2. Related approaches with error in the operator. 

Perturbed singular values. Adhering to a singular-value decomposition 
approach, Cavalier and Hengartner [3] assume that the singular functions 
of K are known, but not its singular values. Examples include convolution 
operators. By an oracle-inequality approach, they show how to reconstruct 
/ efficiently when 5 <e. 

Physical devices. We are given an integral equation Kf = g on a closed 
boundary surface F, where the boundary integral operator 

Kh{x) = J^k{x,y)h{y)ar{dy) 

is of order t > 0, that is, K : i7~*/^(F) — > H^^'^iV) is given by a smooth kernel 
k{x,y) as a function of x and y off the diagonal, but which is typically 
singular on the diagonal. Such kernels arise, for instance, by applying a 
boundary integral formulation to second-order elliptic problems. Examples 
include the single-layer potential operator in Section 6.2 below or Abel-type 
operators with k{x,y) = b{x,y)/\x — y\^ on F = [0, 1] for some (3 > (see, e.g., 
Dahmen, Harbrecht and Schneider [6]). Assuming that k is tractable only 
up to some experimental error, we postulate the knowledge of dks{x,y) = 
dk{x, y) + 6dB{x, y), where B is a Brownian sheet. Assuming moreover that 
our data g is perturbed by measurement noise as in (1.1), we recover our 
abstract framework. 
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Statistical inference. The widespread econometric model of instrumen- 
tal variables (e.g., Hall and Horowitz [13]) is given by i.i.d. observations 
{Xi,Yi, Wi) for i = 1, . . . ,n, where {Xi,Yi) follow a regression model 

Y, = g{Xi) + Ui 

with the exception that E[C/j|Xi] ^ 0, but under the additional information 
given by the instrumental variables Wi that satisfy E[[/j| Wj] = 0. Denoting 
by fxw the joint density of X and W, we define 

k{x,z) := j fxwix,w)fxwiz,w)dw, 

Kh{x) := j k{x, z)h{z) dz. 

To draw inference on g, we use the identity Kg{x) = 'E\K\Y\W]fxw{x: W)]. 
The data easily allow estimation of the right-hand side and of the kernel 
function k. We face exactly an ill-posed inverse problem with errors in the 
operator, except for certain correlations between the two noise sources and 
for the fact that the noise is caused by a density estimation problem. Note 
that K has a symmetric nonnegative kernel and is therefore self- adjoint and 
nonnegative on L?' . Hall and Horowitz [13] obtain in their Theorem 4.2 the 
linear rate of Section 3 when replacing their terms as follows: e = 5 = n~^/^, 
t = a, s = (3 + l/2, d = l. 

In other statistical problems random matrices or operators are of key 
importance or even the main subject of interest, for instance the linear 
response function in functional data analysis (e.g., Cai and Hall [2]) or the 
empirical covariance operator for stochastic processes (e.g., Reiss [17]). 

Numerical discretization. Even if the operator is known, the numerical 
analyst is confronted with the same question of error in the operator under 
a different angle: up to which accuracy should the operator be discretized? 
Even more importantly, by not using all available information on the oper- 
ator the objects typically have a sparse data structure and thus require far 
less memory and time of computation; see Dahmen, Harbrecht and Schnei- 
der [6]. 

3. A linear estimation method. In the following, we write a < 6 when a < 
cb for some constant c > and b when a <b and b <a simultaneously. 
The uniformity in c will be obvious from the context. 

3.1. The linear Galerkin estimator. We briefly study a linear projection 
estimator. Given s > and M > 0, we first consider the Sobolev ball 



W'{M) :={feH';\\f\\H'<M} 
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as parameter space for the unknown /. Pick some j > and let Vj = spanji/^A; 
\M — j} denote an approximation space associated with a ([sj + l)-regular 
multiresolution analysis (Vj); see Appendix A. 6. We look for an estimator 
/(5,£ £ Vj ) solution to 

(3.1) {Ksf5,e, v) = {g,,v) for all vGV,. 

This only makes sense if Ks restricted to Vj is invertible. We introduce the 
Galerkin projection (or stijfness matrix) of an operator T onto Vj by setting 
Tj := PjT\vj, where Pj is the orthogonal projection onto Vj, and set formally 

(3.2) A, := ( ^sjP^9s, if \\Kij\\v,^v, < r2^\ 

{ 0, otherwise, 

where ||Tj||vj^v, — ^'^PveVj,\\v\\^2=i ll-^i^ll denotes the norm of the operator 
Tj : (Vj, II • 11^2) — > (Vj, II • ||j;,2). The estimator fs,e is specified by the level 
j and the cut-off parameter r > (and the choice of the multiresolution 
analysis) . 

3.2. Result. The ill-posedness comes from the fact that is not 

L^-continuous: we quantify the smoothing action by a degree of ill-posedness 
t > 0, which indicates that K behaves roughly like t-fold integration. This 
is precisely defined by the following ellipticity condition in terms of the L?- 
Sobolev norm || • \\h!> of regularity s G M; see Appendix A. 6. 



Assumption 3.1. is self-adjoint on L'^i'D), K:L'^^H^\s continuous 
and (i^/,/)~||/||^_./2. 

As proved in Appendix A. 6, Assumption 3.1 implies that the following 
"mapping constant" of K with respect to the given multiresolution analysis 
iVj) is finite: 

(3.3) Q{K) ■= sup2-^'*||Kri||^, ^ ^_ 

i>o 

Introduce the integrated mean square error 

7^(/,/) :=E[||/-/||i2p)] 

for an estimator f of f and the rate exponent 

r{s,t,d) : 



2s + 2t + d 



Proposition 3.2. Let Q > 0. If the linear estimator ^ is specified by 
2^' ~max{5,e}-2/(2^+2<+'i) andT>Q, then 

sup 7^(^,e,/)<max{5,6}2^•(^'*''^) 
feW={M) 



6 



M. HOFFMANN AND M. REISS 



holds uniformly over K satisfying Assumption 3.1 with Q{K) < Q. 

The normalized rate max{5, ej^C"'*''^) gives the exphcit interplay between e 
and 6 and is indeed optimal over operators K satisfying Assumption 3.1 with 
Q{K) < Q; see Section 5.2 below. Proposition 3.2 is essentially contained in 
Efromovich and Koltchinskii [11], but is proved in Section 7.1 as a central 
reference for the nonlinear results. 

4. Two nonlinear estimation methods. 

4.1. Nonlinear Estimation I. For x > and two resolution levels < 
jo < ji ) define the level-dependent hard-thresholding operator Sx acting on 
L^V) by 

(4.1) Sxih):= E (^'V'A)l||^fe^^^^,>,2lA|*xy (|A|-jo)+ }^^' 

|A|<ii 

for some constant k> and where (ipx) is a regular wavelet basis generating 
the multiresolution analysis {Vj). Our first nonlinear estimator is defined by 

(4-2) //,£ :='5max{5,e}(/5,e), 

where fs^e is the linear estimator (3.2) specified by the level ji and r > 0. 

The factor 2l^l* in the threshold takes into account the increase in the noise 
level after applying the operator -f^^ji • "^^^ additional term y/{\X\ — jo)+ is 
chosen to attain the exact minimax rate in the spirit of Delyon and Juditsky 
[8]. Hence, the nonlinear estimator f^^ is specified by Jq, ji, r and k. 

4.2. Nonlinear Estimation II. Our second method is conceptually differ- 
ent: we use matrix compression to remove the operator noise by thresholding 
Kg in a first step and then apply the Galerkin inversion on the smoothed 
data §£. Let 

(4.3) ks-=S^''iKs,j), 

where Ks^j = PjKs\vj is the Galerkin projection of the observed operator 
and Sg^ is a hard-thresholding rule applied to the entries in the wavelet 
representation of the operator: 

(4.4) Tj^5°P(rj):= J2 ^A,A'l{|T,,„|>rW}(-'V'A)^A', 

|A|,|A'|<J 

where T{x) = Kx^y \ logx\ and Tx^y '■= (^V'AiV'a')- 

The estimator of the data is obtained by the classical hard-thresholding 
rule for noisy signals: 

(4-5) ge:= J2 (9s,1px)^{\{g„i^^)\>T{e)}'^X- 

\X\<J 
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After this prehminary step, we invert the hnear system on the multiresolu- 
tion space Vj to obtain our second nonhnear estimator: 



(4-6) fi!s--={'^ 



otherwise. 



The nonhnear estimator //^ is thus specified by J, k and r. Observe that 
this time we do not use level-dependent thresholds since we threshold the 
empirical coefficients directly. 

5. Results for the nonlinear estimators. 

5.1. The setting. The nonlinearity of our two estimators permits to con- 
sider wider ranges of function classes for our target: we measure the smooth- 
ness s of / in L^-norm, with 1 < p < 2, in terms of Besov spaces Bp p. The 
minimax rates of convergence are computed over Besov balls 

with radius M > 0. We show that an elbow in the minimax rates is given by 
the critical line 

(5.1) i = U ' 



p 2 2t + d 

considering t and d as fixed by the model setting. Equation (5.1) is linked 
to the geometry of inhomogeneous sparse signals that can be recovered in 
L^-error after the action of K; see Donoho [9]. We retrieve the framework 
of Section 3 using = 51,2- 

We prove in Section 5.2 that the first nonlinear estimator fj^ achieves 
the optimal rate over Besov balls Vp{M). In Section 5.3 we further show 

that, under some mild restriction, the nonlinear estimator //^ is adaptive 
in s and nearly rate-optimal, losing a logarithmic factor in some cases. 

5.2. Minimax rates of convergence. In the following, we fix s+ G N and 
pick a wavelet basis (V'a)a associated with an s+-regular multiresolution 
analysis (I^). The minimax rates of convergence are governed by the pa- 
rameters s e (0, s+), p > and separate into two regions: 

dense region: Pdenso := \{s,p) : - < ;^ + ttt— ^ji 

L p 2 2t + d] 

sparse region: ^sparse := \{s,p) : ^ > ^ + 2t + d\ ' 

It is implicitly understood that Bp p C holds, that is, by Sobolev embed- 
dings s — d/p + d/2 > 0. The terms dense and sparse refer to the form of 
the priors used to construct the lower bounds. Note that, an unavoidable 
logarithmic term appears in the sparse case. 
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Theorem 5.1. Let Q > 0. Specify the first nonlinear estimator //^ 
by 2^" ~ max{(5,e}-2/{2«+2t+d)^ ~ max{(5, e}"^/*, t>Q and k > suf- 
ficiently large. 

• For {s,p) G 'Pdcnsc o-iT-d p>l we have 

sup 7^(4„/)<max{5,e}2KM,d), 

uniformly over K satisfying Assumption 3.1 with Q{K) < Q. 

• For {s,p) G Psparsc o^c^ p > 1 we have 



sup 7^(4„ /) < max{<5 J| log6\,eJ\ logel}^^^'^'*''^), 
/ey/(Af) 

uniformly over K satisfying Assumption 3.1 with Q{K) < Q, where now 

s + d/2-d/p 



r{s,p,t,d) :-- 



s + t + d/2-d/p' 



A sufficient value for k can be made explicit by a careful study of Lemma 
7.2 together with the proof of Delyon and Juditsky [8]; see the proofs below. 

The rate obtained is indeed optimal in a minimax sense. The lower bound 
in the case 5 = is classical (Nussbaum and Pereverzev [16]) and will not 
decrease for increasing noise levels 6 oi e, whence it suffices to provide the 
case e = 0. 

The following lower bound can be derived from Efromovich and Koltchin- 
skii [11] for s > 0, p G [1, cx)] : 

(5.2) inf sup 7^(/^,/)><52^('^'*'''^ 

fs {f,K)eJ^s.p,t 

where the nonparametric class J-s,p,t = ^s,p,tiM,Q) takes the form 

^s,p,t = Vp{M) X {K satisfying Assumption 3.1 with Q{K) < Q}. 

For {s,p) G "Pdense the lower bound matches the upper bound attained by 
//g. In Appendix A. 5 we prove the following sparse lower bound: 



Theorem 5.2. For (s,p) G ^sparse we have 



f{s,t,d) 



(5.3) inf sup n{fs,f)>{6J\log6\) 

Is {Kj)&rs,p,t 

and also the sparse rate of the first estimator //^ is optimal. 
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5.3. The adaptive properties of Nonlinear Estimation II. We first state 
a general result which gives separate estimates for the two error levels of //^ 
associated with 5 and e, respectively, leading to faster rates of convergence 
than in Theorem 5.1 in the case of sparse operator discretizations. 

Assumption 5.3. K: Bp ^ B^'^^ is continuous. 

Furthermore, we state an ad hoc hypothesis on the sparsity of K. It is 
expressed in terms of the wavelet discretization of K and is specified by 
parameters (s,p). 

Assumption 5.4. For parameters s > and p > we have uniformly 
over all multi-indices A 

II^V'aIIrw <2i^i("'+'^/2-d/p)_ 

Observe that this hypothesis follows from Assumption 5.3 with (s,p) = 
(s,p), p>l, due to ll-fAAllfiip ~ 2l''*l('*+''/^~'^/p). The case p< 1, however, ex- 
presses high sparsity: if K is diagonal in a regular wavelet basis with eigen- 
values of order 2~l'^l*, then Assumption 5.4 holds for all s,p> 0. For a less 
trivial example of a sparse operator see Section 6.2. Technically, Assump- 
tion 5.4 will allow to control the error when thresholding the operator; see 
Proposition 7.4. 

Finally, we need to specify a restriction on the linear approximation error 
expressed in terms of the regularity in H°^: 

(5.4) a>s( — ^^"^ , ^min(^,ll in the case 5 > ei+'^Z*. 
^ ' ~ \s + t + d/2) \\og5' J 

Then for s G (0, s+), p>l and j? > we obtain the following general result 
in the dense case. 

Theorem 5.5. Grant Assumptions 3.1, 5.3 and 5.4. Let {s,p),{s,p) € 
T'donse satisfy 

2s + d-2d/p 2s -d ., , . ,. , 

(5.5) — < with strict inequality for p> 1 

^ ^ 2s + 2t + d -2t + d ^ y J f 

and assume restriction (5.4) for a > 0. Choose k > and r > sufficiently 
large and specify 2"^ ~ minje"-*^/*, ((5-\/| \og6\)~^^^^^^^} . Then 

sup 7^(/,^i,/) < (e^fo^)^^(^'*''^) + idVWlf''^'''''^- 

f€Vp'{M)nW°'{M) 
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The constant in the specification of 2 cannot be too large; see the proof of 
Proposition 7.5. While the bounds for r and n are explicitly computable from 
upper bounds on constants involved in the assumptions on the operator, they 
are in practice much too conservative, as is well known in the signal detection 
case (e.g., Donoho and Johnstone [10]) or the classical inverse problem case 
(Abramovich and Silverman [1]). 

Corollary 5.6. Grant Assumptions 3.1 and 5.3. Suppose {s,p) £ Vdense 
and a > satisfies (5.4)- Then 

sup n{fg, /) < max{ey^k^, sJll^lf^'''^"^ 

feVp-iM)f]W"{M) 

follows from the smoothness restriction s > {d'^ + 8(2t + d){d — d/p)Y/'^ /A, 
in particular in the cases p = 1 or s > d{l + ^)-^/^/2. 

// in addition d/p < d/2 + s{s — d/2) / {s + t + d/2) holds, then we get rid 
of the linear restriction: 

sup 7^(/f„/) < max{e^/^b^,(5^b^}2^(^'*''^). 
/ev/(M) 

Proof. Set s = s and p = p and use that Assumption 5.3 implies As- 
sumption 5.4. Then the smoothness restriction implies (5.5) and Theorem 
5.5 applies. The particular cases follow because s and p are in Pdense- 

By Sobolev embeddings, p C W"' holds for s — d/p >a — d/2 and the 
last assertion follows by substituting in (5.4). □ 

We conclude that Nonlinear Estimation II attains the minimax rate up 
to a logarithmic factor in the dense case, provided the smoothness s is not 
too small. For (s,p) G "Psparse the rate with exponent r{s,p,t,d) is obtained 
via the Sobolev embedding Bp p C B!^ .^ with s — d/p = a — d/ir such that 
ensej and even exact rate-optimality follows in the sparse case. 

6. Numerical implementation. 

6.1. Specification of the method. While the mapping properties of the 
unknown operator K along the scale of Sobolev or Besov spaces allow a 
proper mathematical theory and a general understanding, it is per se an 
asymptotic point of view: it is governed by the decay rate of the eigenvalues. 
For finite samples only the eigenvalues in the Galerkin projection Kj matter, 
which will be close to the first 2'^'^ eigenvalues of K. Consequently, even if 
the degree of ill-posedness of K is known in advance (as is the case, e.g., in 
Reiss [17]), optimizing the numerical performance should rather rely on the 
induced norm || • \\kj ■= \\KJ^ • [1^2 on Vj and not on || • \\f^t. 
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Another practical point is that the cut-off rule using r in the definitions 
(3.2) and (4.6) is not reasonable given just one sample, but needed to handle 
possibly highly distorted observations. An obvious way out is to consider 
only indices J of the approximation space Vj which are so small that Kg j 
remains invertible and not too ill-conditioned. Then the cut-off rule can be 
abandoned and the parameter r is obsolete. 

The estimator //^ is therefore specified by choosing an approximation 
space Vj and a thresholding constant k. Since a thresholding rule is applied 
to both signal and operator, possibly different values of k can be used. In 
our experience, thresholds that are smaller than the theoretical bounds, but 
slightly larger than good choices in classical signal detection work well; see 
Abramovich and Silverman [1] for a similar observation. 

The main constraint for selecting the subspace Vj is that J is not so large 
that Kg J is far away from Kj. By a glance at condition (7.6) in the proof of 
Theorem 5.5, working with || • \\kj instead of || • Wjjt and with the observed 
operator before thresholding, we want that 

\\Ks,J - < 

with some p E (0,1). This reduces to \\{ld—5K^jBj)~^ — Id || Vj^Vj < P, 
which by Lemma 7.1 is satisfied with very high probability provided 

(6.1) Xmin{Ks,j) > c6^dim{Vj), 

where Amin(») denotes the minimal eigenvalue and c > a constant depend- 
ing on p and the desired confidence. Based on these arguments we propose 
the following sequential data-driven rule to choose the parameter J: 

(6.2) J := min{j > 0\X„,in{Ksj+i) < cS dim{Vj+i)}. 

This rule might be slightly too conservative since after thresholding Kg will 
be closer to Kj than Kg^j. It is, however, faster to implement and the desired 
confidence can be better tuned. In addition, a conservative choice of J will 
only affect the estimation of very sparse and irregular functions. 

6.2. A numerical example. We consider a single-layer logarithmic po- 
tential operator that relates the density of the electric charge on a cylinder 
of radius r = 1/4 to the induced potential on the same cylinder, when the 
cylinder is assumed to be infinitely long and homogeneous in that direction. 
Describing the angle by e^'^"' with x E [0, 1], the operator is given by 

Kf{x)= [ k{x,y)f{y)dy with /c(x, y) = - log(i|sin(7r(x - y))]). 
Jo 

The single-layer potential operator is known to satisfy a degree of ill-posedness 
t = l because of its logarithmic singularity on the diagonal. In Cohen, Hoff- 
mann and Reiss [5] this operator has been used to demonstrate different 
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OperatorRepresentation(J=7) NoisyOperator(J=7,defta=0.001) 




0,2 0.4 0.6 O.S 1,0 0.2 -J 0.6 0,3 1,0 



(a) fb) 
Fig. 1. Wavelet representation of K (a) and Ks (b). 

solution methods for inverse problems with known operator: the singular- 
value decomposition (SVD), the linear Galerkin method and a nonlinear 
Galerkin method which corresponds to Nonlinear Estimation II in the case 
6 = 0. 

The aim here is to compare the performance of the presented methods 
given that not K, but only a noisy version Ks is available. Our focus is 
on the reconstruction properties under noise in the operator and we choose 
6 = W~^, £ = 10^^. As in Cohen, Hoffmann and Reiss [5] we consider the 
tent function 

/(x) =max{l - 30|rE- i|,0}, xG[0,1], 

as object to be estimated. Its spike at a; = 1/2 will be difficult to reconstruct. 

For implementing the linear and the two nonlinear methods we use 
Daubechies wavelets of order 8 (with an extremal phase choice). We cal- 
culate the wavelet decomposition of K and / up to the scale Jmax = 10 by 
Mallat's pyramidal algorithm. For the nonlinear methods the large space 
Vj, on which the Galerkin inversion is performed, is determined by the rule 
(6.2) with c = 5. Figure 1(a) shows the modulus of the wavelet discretization 
(|i^A,/x|) of the operator K on Vj with J = 7. Multi-indices with the same 
resolution level j are presented next to each other; the resolution level j 
decreases from left to right and from bottom to top. The units are multiples 
of b. The finger-like structure, showing large coefficients for low resolution 
levels, along the diagonal and certain subdiagonals, is typical for wavelet 
representations of integral (Calderon-Zygmund) operators and due to the 
support properties of the wavelets; see, for example, Dahmen, Harbrecht 
and Schneider [6] . In Figure 1 (b) the modulus of the wavelet discretization 
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of the noisy observation Ks is shown. The structures off the main diagonal 
are hardly discernible. 

The performance of the methods for this simulation setup are very stable 
for different noise realizations. In Figure 2(a) a typical linear estimation 
result for the choice j = 5 is shown along with the true function (dashed). 
Remark that because of 2^/2^ = 1/4 the result is obtained by using only 
the values of that are depicted in the upper rig ht quarter [0.75, 1]^ of 
Figure 1(b). For the oracle choice j = 5 the root mean square error (RMSE) 
is minimal and evaluates to 0.029. 

For the two nonlinear estimation methods, the approximation space Vj 
(i.e., Vj^ for Nonlinear Estimation I) chosen by the data-driven rule is J = 7 
for all realizations. As to be expected, the simulation results deviate only 
marginally for different choices of c G [1, 20], giving either J = 6 or (mostly) 
J = 7. An implementation of Nonlinear Estimation I is based on a level- 
dependent thresholding factor which is derived from the average decay of 
the observed eigenvalues of Ks^j, ignoring the Delyon-Juditsky correction 
V i\M ~ io)+- With the threshold base level k = 0.4 (oracle choice) Nonlinear 
Estimation I produces an RMSE of 0.033. It shows a smaller error than the 
linear estimator at the flat parts far off the spike, but has difficulties with 
too large fluctuations close to the spike. The main underlying problem is 
that after the inversion the noise in the coefficients is heterogeneous even on 
the same resolution level which is not reflected by the thresholding rule. 

Setting the base level k = 1.5 for thresholding the operator and the data, 
the resulting estimator /^^ of Nonlinear Estimation II is shown in Fig- 
ure 2(b). It has by far the best performance among all three estimators 
with an RMSE of 0.022. The only artefacts, from an a posteriori perspec- 
tive, are found next to the spike and stem from overlapping wavelets needed 
to reconstruct the spike itself. 




(a) fb) 

Fig. 2. Linear estimator (a) and Nonlinear II estimator (b). 
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In Cohen, Hoffmann and Reiss [5] simulations were performed for e = 
2 • 10"^ knowing the operator K (5 = 0). There the respective RMSE under 
oracle specifications is 0.024 (SVD), 0.023 (linear Galerkin), 0.019 (nonlinear 
Galerkin). In comparison we see that roughly the same accuracy is achieved 
in the case 5 = 10~^, e = 10~^, which shows that the error in the operator 
is less severe than the error in the data. This observation is corroborated by 
further simulations for different values of 5 and e. 

In order to understand why in this example the error in the operator 
is less severe and Nonlinear Estimation II performs particularly well, let 
us consider more generally the properties for thresholding a sparse operator 
representation as in Figure 1(a). This is exactly the point where Assumption 
5.4 comes into play with p € (0,1). To keep it simple, let us focus on the 
extreme case of an operator K which is diagonalized by the chosen wavelet 
basis with eigenvalues 2~l'*'l*. Then K satisfies Assumption 5.4 for all {s,p) 
and by Theorem 5.5, choosing p such that {s,p) G Pdensc and restriction (5.5) 
is satisfied with equality, we infer 

sup 7^(/f„ /) < (ev^b^)^^^^'^'*''^) + (5y^T^)-°«2.-.)/t,2}_ 

f&V^{M)r\W'^{M) 

This rate is barely parametric in 5 for not too small s. Hence, Nonlinear 
Estimation II can profit from the usually sparse wavelet representation of an 
operator, even without any specific tuning. This important feature is shared 
neither by Nonlinear Estimation I nor by the linear Galerkin method. 

7. Proofs. 

7.1. Proof of Proposition 3.2. By definition, Tl{fs^£,f) is bounded by a 
constant times the sum of three terms I + II + III, where term /// comes 
from fs,e = if WK^jWvj^v, > r2^': 

I--=\\f-fj\\h, 

II :=E[||(K^/P,5. - /,)l{||,.-i||,^^^,^<.2..}lli^]> 
///:=||/||i.P(||i^^i||y^^y^.>r2^-*). 

The term I. This bias term satisfies under Assumption 3.1 
ll/-/.lli.<2"^^'^~max{<5,.r^/(2^+2*+'^) 
by estimate (A.l) in the Appendix and thus has the right order. 
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The term III . For p € (0, 1) let us introduce the event 

(7.1) n,,s,j = {S\\Kr^B^\\v,^y^<p}. 

On the event ^p,5,j the operator Kgj = Kj{ld+6K~^Bj) is invertible with 
\\Ksj\\vj-^Vj < C^- py^\\K]'^\\vj-*Vj because 

(7.2) \\{ld+6Kr^B,r^v,^v, < E ll-Ji^-'^'llvs-^y, < (1 " P)"' 

m>0 

fohows from the usual Neumann series argument. By (3.3), the choice p > 
1 - Q/r G (0, 1) thus implies {\\KsJ \\v^^v, > r2^*} C n^^gj. For r/ = 1 - {2t + 
d) I (2s + 2t + d) > and sufficiently small 5, we claim that 

(7.3) IP(5^p,5j) < exp(-Cp(^-''22^'^) for some C7 > 0, 

which implies that term /// is of exponential order and hence negligible. To 
prove (7.3), we infer from (3.3) 

C {2~^^/^Bj\\v^^v,>pS~'Q~'2-^^^'+^y^} 

and the claim (7.3) follows from §^^2^^^'^*'^'^^^'^ > and the following clas- 
sical bound for Gaussian random matrices: 

Lemma 7.1 ([7], Theorem II. 4). There are constants (3o,c,C > such 
that 

V/3 > /3o :P(2-^'^/2||^.||^^_^^ > ^) < exp(-c/3222^^^), 
V/? > 0:P(2-^^^/2||^^.||^^_^^ < ^) < {Cpf'\ 

The term II . Writing Pjge = PjK f + ePjW and using the independence 
of the event ^psj from PjW (recall B and W are independent), we obtain 

< 2'^\\\P,Kf\\l,+e'E[\\PjW\\U + ||/,||i.)P(f^^,,,,)- 

Because of \\PjKf \\l2 + \\fj\\l2 < M^, E[\\PjW\\l2] < 2^'^ and estimate (7.3), 
we infer that the above term is asymptotically negligible. Therefore, we are 
left with proving that K[\\K^jPjge — /j |||2lQp ^ ^1 has the right order. On 
^p,j,s we consider the decomposition 

Kijp.ge - fj = m+6K-^B,r' - Id)/, 

(7.4) 

+ e{ld+6Kr^Bj)-^Kr^PjW. 
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As for the second term on the right-hand side of (7.4), we have 

< £222^*2'^^ ~ max{5, e}4-/(2-+2*+d) , 

where we used again the independence of ^p^sj and PjW and the bound 
(7.2). The first term on the right-hand side of (7.4) is treated by 

E[||5i^rii?^.(Id+5/i7i4.)-V,lli2ln„,J 

< 5H^H''^ < max{5, e}4«/(2«+2t+d) ^ 

where we successively used the triangle inequality, linij^oo ||l2 = II/IIl^ 
from (A.l), bound (7.2), Lemma 7.1 and (3.3). 

7.2. Proof of Theorem 5.1. 

The main decomposition. The error TZ{fg^,f) is bounded by a constant 
times the sum I + II + III with 

I--=\\f-fn\\h. 

// :=E[||5max{<5,e}(/<5,e) - /7iIIl21|||^-i II <r2Jin]' 

III:=\\f\\l,n\\Kil\\v,,^V,,>r2^''). 

For the term I, we use the bias estimate (A.l) and the choice of 2^^. The 
term /// is analogous to the term III in the proof of Proposition 7.1; we 
omit the details. To treat the main term //, we establish sharp error bounds 
for the empirical wavelet coefficients {fs,ei'4'x) for |A| < ji. 

The empirical wavelet coefficients. We consider again the event ^p,s,ji 
from (7.1) with ji in place of j. On that event, we have the decomposition 

he = KijPj^ge = fn - SKr'B.J,, + eKi^'Pj,W + r« + rg^, 

with 

n>2 

^fl = -e5K-^B,,{U+5K,,B,,)-^K-^P,,W. 
In the Appendix we derive the following properties. 
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Lemma 7.2. Let |A| < ji and p€ {0,1 — Q/t). Under Assumption 3.1 
the following decomposition holds: 

5{K-^i3jJj,,i^x) = <52l^l*||/,J|i2CAeA, 

(rg^,V'.)=522l^l*||/,J|^.2^-^(*+'^)CA,,„ 

(r£)^.^,V'A)=fe2l^l*2^-^(*+'^/2)a,„ 

on rjp^^jj, where \c\\,\cx\ < 1, and o,re standard Gaussian variables 
and C\,ji J Cx.ji 0-1"^ random variables satisfying 

max{P({|CAjJ > /?} n np,s,n),m\Cx,n\ > /?} n n^^sj,)} < exp(-c/322^'^'^) 
/or all P > Po with some (explicitly computable) constants (3q,c> 0. 

From this exphcit decomposition we shall derive the fundamental devia- 
tion bound 

F({2-I^I*K Ae, i^x) - ifn , V'a) I > Pmax{5, e}} n J 

(7.5) 

<4exp(-C/3min{/3,2^i'^}) 

for all |A| <ji and some explicitly computable constant C > 0. Once this is 
achieved, we are in the standard signal detection setting with exponentially 
tight noise. The asserted bound for term // is then proved exactly following 
the lines in [8]; see also the heteroskedastic treatment in [14]. The only fine 
point is that we estimate the Galerkin projection /j^, not /, but by estimate 
(A.2) in the Appendix < \\f\\B}^^. 

It remains to establish the deviation bound (7.5). By Lemma 7.2, that 
probability is bounded by the sum of the four terms 

^/:=F(||/,-JU2CA|eA|>0, 
Pn:=r(^\cxix\>^y 

Pni ■■=^{[52^'^'^'Hfn\\L<x,n > f }n^^Mil)> 

Piv := pQ52^-^(*+'^/2)Ca,,, > ^} n O,,^,,,) . 

We obtain the bounds Pi < exp(— c//3^), Pjj < exp(— c///3^) with some con- 
stants cj, cn > by Gaussian deviations. The large deviation bound on Caji 
in Lemma 7.2 implies with a constant c/// > 

Pm < exp(-c////32-^'i(*+°'-2"')(5-i). 
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Equally, Piv < exp(-c/y/32~Ji(*+'^/2~2d)^-i) follows, which proves (7.5) with 
some C > depending on c/ to c/y since > 2-'^* by construction. 

7.3. Proof of Theorem 5.5. The proof of Theorem 5.5 is a combination 
of a deviation bound for the hard-thresholding estimator in if*-loss together 
with an error estimate in operator norm. The following three estimates are 
the core of the proof and seem to be new. They are proved in the Appendix. 

Proposition 7.3 (Deviation in i?*-norm). Assume k > A^/tJd, 2^ < 
e~^/* and that s > 0, p > are in Pdense- Then there exist constants cq, Ro > 



such that for all functions g G -B^J^ the hard-thresholding estimate 5, 

(4.5) satisfies with m := max{||P75t||^.+t, 

Vr? > r?o :P(T(e)-^(^'*''^)||5e - Pjg\\m > vm'~''^''''''^) < e'^"^' + ^-Vs-^A^ 
Vi? > iio : nUe - PjgWm >m + R)< e''Vi6-dA^-4_ 

Proposition 7.4 (Estimation in operator norm, L^-bound). Suppose 
K? > 32max{(i/t, 1 + d{2t + d)/{At{t + d))}. Grant Assumption 5.4 with s > 
0,p>0 satisfying restriction (5.5). Then 

E[\\Ks - < (5v^bS)2^(^"*''^). 

Proposition 7.5 (Estimation in operator norm, deviation bound). Sup- 
pose Koo '■= sup^ ;^ 2l'^l*|(irV'/x5 < 00. Then for all r] > 

FiWks - Kj\\^Vj,Mr,2)^H^ ^ ^Ol log^r^/^ + ^) < 5,min{.V2-2d/(t+d),l/2,,}^ 
with qi := 2"^ {6^/\Tog5\)^^^^~^'^^ and a constant cq depending only on K^q. 

For p G (0, 1) we introduce the event 

(7.6) J7^,,^:={||i^,-i^j||(y,,||.||^,)_.^.<p||A7i(-^^_l|.|l^^)^^,}. 

The Neumann series representation implies that on fi^^^ j the random oper- 
ator 

ks:{Vj,\\*\\L2)^{Vj,\\*\\Ht) 

is invertible with norm < (1 — p)^"'^||A'J^||. For the subsequent choice 

p G (0, 1 — ||A'J"'^||/r) this bound is smaller than the cut-off value r. On 0^^^ j 

we bound \\f"e- fWi^ by 

\\Ki\ge - P.jg)\\L^ + \\{ki^ - Kj^)P.jg\\L^ + \\fj - 
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The last term is the bias and has the right order by estimate (A.l) and the 
restriction (5.4) on a. The first two stochastic errors are further bounded 
by 

W^S \\{Vj,M„t)^L^(.\\9e - Pj9\\h' + WKs - ^j||(v:7,||.||s.^)^H*II/jIIb|,p)- 

Because of ||,||^^)_^^2 < r, the assertion on ^^p^ j follows from the 

standard risk estimate in //*-loss (cf. Proposition 7.3 or [14], Theorem 3.1) 

n\9e-pjg\\u<T{er'^^^''''\ 

from the operator norm estimate of Proposition 7.4 and ^ II/IIb'' ; 

see (A.2). 

On the complement {^^pSjY the risk of //^, conditional on is uni- 
formly bounded thanks to the cut-off rule in the construction. Assumption 
5.4 and the symmetry of K imply 2l^l*| V'a)| < 2-II^I~I^II("+'^/2-'^/p). 
Consequently, Proposition 7.5 is applicable and a sufficiently large choice of 
K and a sufficiently small choice of qi by means of an appropriate choice of 
the constant in the specification of 2^ give P((f2^^^ jY) < 5"^ , which ends the 
proof. 

APPENDIX 

A.l. Proof of Lemma 7.2. 

First equality. By Assumption 3.1, Kj-^ is symmetric and thus 5{Kj^Bj-^^ fj-^ , 
i'x) = ^{Bjifji, K~^ipx) . This is a centered Gaussian random variable with 
variance ^^H/jj |||2 ||-f^'J^VA|li2- Assumption 3.1 gives \\K~^'il;x\\l2 < Hxllnt < 
22|A|t ^ggg Appendix A. 6) and the first equality follows from estimate (A.l). 

Second equality. We write e{Kj'^^ Pj^^W ,^l^x) = e{W,Kj'^^iljx), which is 

centered Gaussian with variance e^||i^j^"^VA|li2, and the foregoing arguments 
apply. 

Third equality. On ^p,5jj the term \{r'^^j-^,ipx)\ equals 
\{{6Kr'B^,f{ld + 6Kr'B,,r^fj„i;x)\ 

= 5\B,,Kj^^B,A^d + 5KT^B,,)-^f,,,KT^^Jx)\ 

X ll/iillL2||^j;VA||L2 

<5^B,X^^_y^2^^'2\^\\ 
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where we successively applied the Cauchy-Schwarz inequality, (3.3), (7.2) 
on f^p,5ji and (A.l) together with the same arguments as before to bound 
ll-f^j-^^V'AllL^- Lemma 7.1 yields the result. 

Fourth equality. Since W and B are independent, we have that, con- 

(2) 

ditional on B, the random variable {fs^j^,tpx)'^npSj^ is centered Gaussian 
with conditional variance 

5V||(i^ri4.^(Id + JK-ii?,J-iK-i)*^,||2an,,,,^ 

= 5V||(4,(Id + 5A'ri4.J-iA-i)*A-VA|li2lc,,,,,, 
<6'e^{B,AU+SKi^'B,,)-'K-'r^^^^^^ 

by (3.3) and estimate (7.2), which is not affected when passing to the ad- 
joint, up to an appropriate modification of ^p,s,ji incorporating B*_^. We 
conclude by applying Lemma 7.1 which is also not affected when passing to 
the adjoint. 

A. 2. Proof of Proposition7.3. Denote by and the wavelet coefh- 
cients of g and g^. We have 

\X\<J 

The usual decomposition yields a bound of the right-hand side by the sum 
of four terms I + II + III + IV with 

I:=J2 2^l^l*(5£ - 5^)^l{|gA|>(i/2)r(e)}> 

/// := E 2^^^^\g^)H^^gX^gX^^r{e)}^ 

/^:=E2""*(/)'l{|,^|<2r(.)}, 
and where the sums in A range through the set {|A| < J}. 



The term IV. This approximation term is bounded by 
E2^^* E {2T{e)f-Pmin{{g'r,{2T{e)r} 

3<J |A|=j 

< T{ef-P E 2''* min{||Pj9r„.+,2-^(^+*+'^/2-'>'/f)f , 2^'^T(e)P} 

3<J 
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which is of order T(e)22^(2t+d) ^^^^ 

Therefore, we obtain IV < \\Pjgf~Z^''^''^'^T{ef'^'^^''^l 

The term I. For this second approximation term we need to introduce 
the random variables 

£-2 

■= #{|A|=,,|,A|>(i/2)r(.)} - ^^')'lm^l>(V2)T(.)}. 

Using l||gA|>(i/2)r(£)} ^ |25''^/'?"(e)|^, we obtain for the least favorable case 
l/p= 1/2 + s/{2t + d) that term / is bounded by 

^22J*e2^j ^ l{|gA|>(i/2)r(£)} 

|A|=i 

<^22^V^,min|r(.)-P ^ |/r,2^-4 

3<J 

Now observe that, as for term IV , the following inequality holds: 

V min{r(e)-P2-^^(^+*+'^/2-d/p)p+2jt||p^^||P 2^-(2t+d)} ^ £22^(2*+^) 

with 2^'(2.+2t+d) _j^i^|||p^^||P^^^^(g)-2^2-^(2s+2t+d)|_ 



p,p 



By definition, each has a normalized (to expectation 1) x^-distribution 
and so has any convex combination aj^j. For the latter we infer IP(X)j ^jCj ^ 
r/2) < e~''^/2, > 1, by regarding the extremal case of only one degree of free- 
dom. Consequently, we obtain ¥{cie~'^2~^'^'^^^'^^ I > rf) < e"''^/^ with some 
constant ci > 0. Substituting for j, we conclude with another constant C2 > 
that 

ni>W\\Pj9rBs (T(e)||Pjg|r?/?)2-(«'*''^))<exp(-C2r?2|loge|) = e'=^^'. 

PiP p,p 

The terms II and III. For these deviation terms we obtain by indepen- 
dence and a Gaussian tail estimate 

P({// = 0} n {/// = 0}) > ¥{\gl - g^\ < ir(e) for all |A| < J) 

> (l-exp(-K2|loge|/8))*^'^. 
Using #Vj ~ 2'^'^ < e-'^/*, we derive P(// + /// > 0) < e'^'/s-dA. 
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The first assertion. We obtain for some r/o > 1 and all rj>r]Q: 
n\\9e - PjgWm > r^mi-'-(^'*'^)T(e)^(^A'^)) 



< ¥{I > ir7'||Pj5HlVT^''*''^V(e)2-(^'*''^)) + ¥{II + III > 0) 

-2r( 
2'/ 11-^ JUll ^s+t 



lr,2||p,„||2-2''(5,i,'^)^(^g>)2r(sA 



T/ie second assertion. We show that the deviation terms are well bounded 
in probability. While obviously /// < \\Pjg\\jrt < \\Pjg\\l,s+t holds, 



E[II]< J2 2^\^\%{g^ - g^)'']'/'F{\g^ - g'\>T{e)/2)'/^ 
\x\<j 

is bounded in order by 2'^(2t+rf)£2 gxp(K2| loge|/8)^/^ ~ ^n^/w~d/t 2^ < 

e~^l^ . In the same way we find 

|A|<J 

By Chebyshev's inequality, we infer P(// > R^) < e'^Vi6-d/t^-4 f^^. ^ > q. 
Since the above estimates of the approximation terms yield superoptimal 
deviation bounds, the estimate follows for sufficiently large R. 

A. 3. Proof of Proposition 7.4. The wavelet characterization of Besov 
spaces (cf. Appendix A. 6) together with Holder's inequality for + q~^ = 1 
yields 



\\Ks-Kj\ 



~ sup 

||(aM)||fP=l 



V|/.|<j 



X sup iii^jV'/.irii+t'^^ ^11(^5 -i^j)^^!^*- 



Due to Assumption 5.4 the last £'^-norm can be estimated in order by 

||^2j(-(s-<i/2)+{S+d/2-d/j3){l-r(SAd)))^ .^^^^^^ 

which is of order 1 whenever restriction (5.5) is fulfilled. 
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By construction, Ksi^fi is the hard-thresholding estimator for Kjip^ given 
the observation of Ksjtp^, which is Kjil)^ corrupted by white noise of level 
5. Therefore Proposition 7.3 applied to Kip^ and 5 gives for any r]>r]Q: 

p,p 

By estimating the probability of the supremum by the sum over the proba- 
bilities, we obtain from above with a constant ci > for all ri>r]Q: 

\ II WiJp^p/ 

\^J.\<J 

< ^corf-d/{t+d) _|_ ^K^/&-d{2t+d)/{t{t+d)) 

For a sufficiently large r/i > depending only on cq, d and t, with 7 := 
K^/S - d{2t + d)/{t{t + d)) > 0, we thus obtain 

p,p 

By the above bound on the operator norm and Holder's inequality for q : = 
7/2 > 2 and p^^ + = 1 together with the second estimate in Proposition 
7.3, we find for some constant Rq> 0: 

E[\\Ks - ^j\\fvj,Mss^^)^Ht'^{\\Ks-Kj\\^y^^^^,^^^^ )^^t>r?ir(5)'-(--.*.d)}] 



/ l-OO ^ \ 1/p 



<max{5('''/i6-2'^/*)/M}52 
which is of order (5^ by assumption on k and the assertion follows. 



A. 4. Proof of Proposition 7.5. For |/^|, |A| < J we have for the entries in 
the wavelet representation 

\{K5)^,,\ - i^M,Al = \K^,,\\'y{\{Ks)^,x\<r{5)} + 5\B^,,x\l{\[Ks)^^x\>r{&)}- 
A simple rough estimate yields 

I(^5)m,A - K^^x\ < 2T(5) + \Kf,^x\'i-{\(Ks-K)^^^\>T{S)} + ^B^^xl- 
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We bound the operator norm by the corresponding Hilbert-Schmidt norm 
and use K^o < oo to obtain 

ll-f'^<5-^j||?Vj,||.|li2)^H* 

M,\M<J 

<2^Jit+d)^(^Sf^#{6\{Ks-K)^,x\>Ti6)} + 6^2''-^' E ^m,a> 

ImI.|a|<J 

where the cardinaUty is taken for multi-indices (A,/i) such that |A|, < J. 
The first term is of order |log(5|~^. In view of {Ks — K)^^\ = 5B^^\, the 
second term is a binomial random variable with expectation 2^"''^P(|ij^^A| ^ 
K|log(5|^/^) < J-2(i/(t+d)+K2/2_ exponential moment bound for the bino- 
mial distribution yields 

For the last term, we use an exponential bound for the deviations of a 
normalized x^-distribution, as before, to infer from 2'^^*"'"'^^ < T{5) that 

P E ^',A > ^) < exp(-2-2^(*+'^)-i5-%) < 5^/291 

holds, which gives the result. 

A. 5. Proof of Theorem 5.2. To avoid singularity of the underlying prob- 
ability measures we only consider the subclass of parameters (K, /) such 
that Kf = yo for some fixed yo G L^, that is, J^o := {{K, f)\f = K~^yQ, K e 
/C}, where /C = KLt{C) abbreviates the class of operators under considera- 
tion. We shall henceforth keep yo fixed and refer to the parameter {K, f) 
equivalently just by K. 

The likelihood A(«) of P^ under the law P^ corresponding to the pa- 
rameters K^, i = 1,2, is 

K{K\K^) = exp(ri(i^2 _ K\B)ns - h^-^WK^ - K^\\ls) 

in terms of the scalar product and norm of the Hilbert space HS{L^) of 
Hilbert-Schmidt operators on and with a Gaussian white noise operator 
B. In particular, the Kullback-Leibler divergence between the two measures 
equals |(5~^||i<r^ — ^^^Ung and the two models remain contiguous for S ^ 
as long as the Hilbert-Schmidt norm of the difference remains of order 6. 

Let us fix the parameter /o = "0-1,0 = 1 and the operator which, in 
a wavelet basis (V'a)a) has diagonal form K'^ = diag(2~(l'*'l+^)*). Then K'^ is 
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ill-posed of degree t and trivially obeys all the mapping properties imposed. 
Henceforth, yo := K^fo = 1 remains fixed. 

For any k = 0,. . . , 2'^'^ — 1, introduce the symmetric perturbation = 
^) with vanishing coefficients except for H'^^^ ^j^^ = 1 and H^^j^^ = 
1. Put K'' = + 7if^ for some 7 > 0. By setting 'y:=5Jwe enforce \\K^ — 
K^WiiS = SJ. For fs := {K^)-^yo, we obtain 

/,-/o = ((/^^)-i-(K°)^i)yo 

= 7(i^")-Vj,o. 
Now observe that H'^ trivially satisfies the conditions 

M\H'\\^s <2^(*+^+'^/2-'^/f). 

p,p ^ p,p 

This implies that for -y2'^^^~^^~^'^^'^~^^P') sufficiently small K'^ inherits the map- 
ping properties from K^. Hence, 

\\fe-fo\\L^~lU.J,o\\Ht=l2-^\ 

Wfe - /ob^„ ~7llV'J,olU^+; =72-'(*+^+'^/'-''/P) 

follows. In order to apply the classical lower bound proof in the sparse case 
([15], Theorem 2.5.3) and thus to obtain the logarithmic correction, we nev- 
ertheless have to show that — /o is well localized. Using the fact that 
{{H'^)'^)xfj_ = 1 holds for coordinates A = /i = (0,0) and A = /x = (J, /c), but 
vanishes elsewhere, we infer from the Neumann series representation 

00 

fe-h= ^(-7^^)™/o 
m=l 

00 00 

n=l 71=0 

7 

= --. oi'yfo - i^j,k)- 

1 — 7^ 

Consequently, the asymptotics for 7 ^ are governed by the term —"fipj^k, 
which is weh localized. The choice 2"^ < j-'^/i't+^+d-f^-'i/p) ensures that H/eHb^ 
remains bounded and we conclude by usual arguments; see Chapter 2 in [15] 
or the lower bound in [17]. 
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A. 6. Some tools from approximation theory. The material gathered here 
is classical; see, for example, [4]. We call a multiresolution analysis on Li^iT)) 
an increasing sequence of subspaces (Vj)j>o generated by orthogonal wavelets 
(V'a)|a|<J5 where the multi-index A = (j, fc) comprises the resolution level 
|A| := J > and the d-dimensional location parameter k. We use the fact 
that for regular domains #Vj ~ 2"^'^ and denote the L^-orthogonal projec- 
tion onto Vj by Pj. 

Given an s+-regular multiresolution analysis, s-)_ G N, an equivalent norm- 
ing of the Besov space Bpp, s S (— s+, s+), p > 0, is given in terms of weighted 
wavelet coefficients: 

/ oo \ i/p 

\i=-l k ) 

For p < 1 the Besov spaces are only quasi-Banach spaces, but still coincide 
with the corresponding nonlinear approximation spaces; see Section 30 in 
[4]. If s is not an integer or if p = 2, the space ^ equals the L^'-Sobolev 
space, which for p = 2 is denoted by if'*. The Sobolev embedding generalizes 
to C B'p,p, foT s>s' and s-f>s'- ^. 

Direct and inverse estimates are the main tools in approximation theory. 
Using the equivalent norming, they are readily obtained for any — s+ < s' < 
s < s+: 



B. 



V/i,Gy,:||/.,b|,,<2(-^')^||/i,||^.,^. 

In [5] it is shown that under Assumption 3.1 ||j:^t_,^2 < 1 and we infer 

from an inverse estimate ||iirj~"^||x,2^i2 < 2-^*. 

Let us finally bound ||/ — fj\\ and ||/j|| for diverse norms. By definition, fj 
is the orthogonal projection of / onto Vj with respect to the scalar product 
{K-,-) such that \\K^/'^{f - fj)\\L^ < \\K^/'^{ld-Pj)f\\L2 and by Assumption 
3.1 II/ — fj\\^-t/2 < ||(Id— Pj)/||j:^-t/2. Using the equivalent (weighted) 
norms, we find ||/ - /j||^-t/2 < ||(Id-Pj)/||^-t/2 for any p. By a direct 

estimate, we obtain ||(Id-P-)/|| ,/2 < 2-J-(^+*/2)||/||b. and 



hence 



II/-/,IIs-v.<2-^'(^+*/^)||/||b|„, 



l^'./-/,ll5-;/^<2-^'(^+*/^)||/b|„. 
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An inverse estimate, applied to the latter inequality, yields together with 
the Sobolev embeddings {p < 2) 
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(A.l) 



\\f - IAl^ <\\f - Pjfh^ + \\Pjf - fjh^ 

^ -jis+d/2-d/p)ufU 




IVIerely an inverse estimate yields the stability estimate 



(A.2) W/jWbs^^ < II/, - P,f\\Bs^^ + ||P,-/||b|„ < ||/||b|,,. 
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